Goals

  1. Load in phyloseq data with rooted tree.
  2. Evaluate sequencing depth and remove sample.
  3. Normalize the read counts between samples.
  4. Calculate community dissimilarities. Numbers between 0 and 1. If 0, completely similar versus if they are 1, then they are completely dissimilar.
    1. Sorensen: Presence/ Absence. Weighted by number of shared taxa. Shared species as a binary-valye. Abundance-unweighted.
    2. Bray-Curtis: Relative abundance. Weighted by number of shared taxa. Shared abundant species: abundance weighted.
    3. (Abundance) Weighted UNIFRAC: Consider abundant species and where they fall on the tree.
  5. Visualize the community data with two unconstricted ordinations:
    1. PCoA: Linear method. Uses matrix algebra to calculate eigenvalye. Calculate how much variation is explained by each axis. Can choose to view axis 1, 2, 3, etc. and plot them together.
    2. NMDS: Non-linear method. Collapse multiple axes into two (or three) dimensions. Can see more axes of variation into fewer axes. Always need to report a stress value. (Ideally less than 0.15)
  6. Run statistics with PERMANOVA and betadispR.

Setup

Load Libraries

# Install packages if needed

pacman::p_load(tidyverse, devtools, phyloseq, patchwork, vegan, ggpubr, rstatix,
               ggplot2,install = FALSE)

Load in colors

gut_section_colors <- c(
  "IV" = "dodgerblue4",
  "V" = "#FF5733")

Load in data

# Load in rooted phylogenetic tree!
load("data/03_Phylogenetic_Tree/phytree_preprocessed_physeq.RData")
unrooted_physeq
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1736 taxa and 11 samples ]
## sample_data() Sample Data:       [ 11 samples by 23 sample variables ]
## tax_table()   Taxonomy Table:    [ 1736 taxa by 9 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 1736 tips and 1734 internal nodes ]
midroot_physeq
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1736 taxa and 11 samples ]
## sample_data() Sample Data:       [ 11 samples by 23 sample variables ]
## tax_table()   Taxonomy Table:    [ 1736 taxa by 9 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 1736 tips and 1735 internal nodes ]

Explore Read Counts

Raw Read Depth

# Sequence depth will inform us on how we want to normalize our data
# Calculate the total number of reads per sample
raw_total_seqs_df <-
  unrooted_physeq %>%
  # Calculate the sample read sums
  sample_sums() %>%
  data.frame()

# Name the column
colnames(raw_total_seqs_df)[1] <- "TotalSeqs"

head(raw_total_seqs_df)
##       TotalSeqs
## 568_4     24503
## 568_5     17303
## 581_5     15963
## 611_5     13893
## E05_5     45378
## E1_4      43941
# Make a histogram of raw reads
raw_seqs_histogram <-
  raw_total_seqs_df %>%
  ggplot(aes(x = TotalSeqs)) +
  geom_histogram(bins = 50) +
  scale_x_continuous(limits = c(0, 50000)) +
  labs(title = "Raw Sequencing Depth Distribution") + 
  theme_bw()

Remove lowly sequenced sample

Normalize Read Counts

### scale_reads function and matround function
#################################################################################### 
# Function to scale reads: http://deneflab.github.io/MicrobeMiseq/ 
# Scales reads by 
# 1) taking proportions
# 2) multiplying by a given library size of n
# 3) rounding 
# Default for n is the minimum sample size in your library
# Default for round is floor

matround <- function(x){trunc(x+0.5)}

scale_reads <- function(physeq, n = min(sample_sums(physeq)), round = "round") {
  
  # transform counts to n
  physeq.scale <- transform_sample_counts(physeq, function(x) {(n * x/sum(x))})
  
  # Pick the rounding functions
  if (round == "floor"){
    otu_table(physeq.scale) <- floor(otu_table(physeq.scale))
  } else if (round == "round"){
    otu_table(physeq.scale) <- round(otu_table(physeq.scale))
  } else if (round == "matround"){
    otu_table(physeq.scale) <- matround(otu_table(physeq.scale))
  }
  
  # Prune taxa and return new phyloseq object
  physeq.scale <- prune_taxa(taxa_sums(physeq.scale) > 0, physeq.scale)
  return(physeq.scale)
  
  }

Rescale all reads so they all represent the count of the lowest number of sequence reads. We will expect each sample to have # of reads around 2200

This is where one might decide to use rarefaction to normalize the data.

Scale reads and check the distribution of the seq depth

min(sample_sums(midroot_physeq))
## [1] 7142
# Scale reads by the above function
scaled_rooted_physeq <-
  midroot_physeq %>%
  scale_reads(round = "matround")

# Calculate read depth
## Look at total number of sequences in each sample and compare to what we had before

scaled_total_seqs_df <- 
  scaled_rooted_physeq %>%
  sample_sums() %>%
  data.frame()

head(scaled_total_seqs_df)
##          .
## 568_4 7143
## 568_5 7134
## 581_5 7140
## 611_5 7159
## E05_5 7149
## E1_4  7137
# Change first column name to be "TotalSeqs"
colnames(scaled_total_seqs_df)[1] <- "TotalSeqs"

# Inspect
head(scaled_total_seqs_df)
##       TotalSeqs
## 568_4      7143
## 568_5      7134
## 581_5      7140
## 611_5      7159
## E05_5      7149
## E1_4       7137
# Check range of data
min_seqs <-
  min(scaled_total_seqs_df)
max_seqs <-
 max(scaled_total_seqs_df)
# Range of seqs
max_seqs - min_seqs
## [1] 25
# Plot histogram
scaled_total_seqs_df %>%
  ggplot(aes(x = TotalSeqs)) +
  geom_histogram(bins = 50) +
  scale_x_continuous(limits = c(0, 10000)) +
  labs(title = "Scaled Sequencing Depth at 7131") + 
  theme_bw()
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_bar()`).

head(scaled_total_seqs_df)
##       TotalSeqs
## 568_4      7143
## 568_5      7134
## 581_5      7140
## 611_5      7159
## E05_5      7149
## E1_4       7137

Calculate & Visualize Community Dissimiliarity

Exploratory analyses from the Paily & Shankar (2016) paper, which is using unconstrained ordination methods like PCoA.

Sorenson PCoA

# Calculate sorenson dissimularity: Abundance-unweighted of shared taxa
scaled_soren_pcoa <-  
  ordinate(
    physeq = scaled_rooted_physeq,
    method = "PCoA",
    distance = "bray", binary = TRUE)

#str(scaled_soren_pcoa)

# Plot the ordination 
soren_gut_section_pcoa <- plot_ordination(
  physeq = scaled_rooted_physeq,
  ordination = scaled_soren_pcoa,
  color = "gut_section",
  shape = "gut_section",
  title = "Sorensen PCoA") +
  geom_point(size=5, alpha = 0.5, aes(color = gut_section)) +
  scale_shape_manual(values = c(15,16)) +
  scale_color_manual(values = gut_section_colors) + 
  theme_bw()
# Show the plot 
soren_gut_section_pcoa

Here, we are evaluating the shared taxa by presence/absence (abundance-unweighted) in the Sorensen metric.

Note that:

  • Axis 1 = ~31% of variation
  • Axis 2 = ~14% of variation

This means we explain 45% of the variation in the data in these two axes.

Bray-Curtis PCoA

# Calculate the BC distance
scaled_BC_pcoa <- 
  ordinate(
    physeq = scaled_rooted_physeq,
    method = "PCoA",
    distance = "bray")

# Plot the PCoA
bray_gut_section_pcoa <- 
  plot_ordination(
    physeq = scaled_rooted_physeq,
    ordination = scaled_BC_pcoa,
    color = "gut_section",
    shape = "gut_section",
    title = "Bray-Curtis PCoA") +
  geom_point(size=5, alpha = 0.5, aes(color = gut_section)) +
  scale_shape_manual(values = c(15,16)) +
  scale_color_manual(values = c(gut_section_colors)) + 
  theme_bw()
bray_gut_section_pcoa

Here, we are evaluating the shared taxa and then weighting them by their abundances, which provides more influence for species that are more abundant.

Note that:

  • Axis 1 = ~30% of variation
  • Axis 2 = ~14% of variation

This means we explain 44% of the variation in the data in these two axes, which is almost exactly the same as the previous plot with the Sorensen Dissimilarity. Abundance does NOT seem to have an influence!!

Weighted-Unifrac PCoA

# Calculate the BC distance
scaled_wUNI_pcoa <- 
  ordinate(
    physeq = scaled_rooted_physeq,
    method = "PCoA",
    distance = "wunifrac")

# Plot the PCoA
wUNI_gut_section_pcoa <- 
  plot_ordination(
    physeq = scaled_rooted_physeq,
    ordination = scaled_wUNI_pcoa,
    color = "gut_section",
    shape = "gut_section",
    title = "Weighted Unifrac PCoA") +
  geom_point(size=5, alpha = 0.5, aes(color = gut_section)) +
  scale_shape_manual(values = c(15,16,17,18, 19, 20)) +
  scale_color_manual(values = c(gut_section_colors)) + 
  theme_bw()
wUNI_gut_section_pcoa

Here, we are evaluating the shared taxa and then weighting them by their abundances, which provides more influence for species that are more abundant.

Note that:

  • Axis 1 = ~60% of variation
  • Axis 2 = ~22% of variation

This means we explain 82% of the variation in the data in these two axes (!!!), which is significantly more than the previous plots with the taxonomic dissimilarity measures. (Almost double.) Here, phylogeny seems to be very important! This means that taxa that are abundant are found in different places in the phylogenetic tree. Therefore, the evoultionary distances (aka the branch lengths) and their abundances seem to have a major influence!!

Combine PCoAs

Let’s plot all three together into one plot to have a concise visualization of the three metrics.

(soren_gut_section_pcoa + theme(legend.position = "none")) + 
  (bray_gut_section_pcoa + theme(legend.position = "none")) + 
    (wUNI_gut_section_pcoa + theme(legend.position = "none"))

NMDS

Weighted Unifrac

Since we did 3 of the dissimilarity metrics for the PCoA, let’s just plot one example of them for the NMDS plotting. Here, we will use weighted Unifrac

# Calculate the Weighted Unifrac distance
scaled_wUNI_nmds <- 
  ordinate(
    physeq = scaled_rooted_physeq,
    method = "NMDS",
    distance = "wunifrac")
## Run 0 stress 0.01900662 
## Run 1 stress 0.01946183 
## ... Procrustes: rmse 0.0335616  max resid 0.05043899 
## Run 2 stress 0.01822508 
## ... New best solution
## ... Procrustes: rmse 0.06233217  max resid 0.1340897 
## Run 3 stress 0.01900654 
## Run 4 stress 0.01900668 
## Run 5 stress 0.01822508 
## ... Procrustes: rmse 2.263515e-05  max resid 3.765399e-05 
## ... Similar to previous best
## Run 6 stress 0.01946177 
## Run 7 stress 0.0190066 
## Run 8 stress 0.0194618 
## Run 9 stress 0.01946179 
## Run 10 stress 0.01900672 
## Run 11 stress 0.01822507 
## ... New best solution
## ... Procrustes: rmse 6.743571e-06  max resid 1.412201e-05 
## ... Similar to previous best
## Run 12 stress 0.01946177 
## Run 13 stress 0.01900664 
## Run 14 stress 0.01822507 
## ... Procrustes: rmse 4.311612e-06  max resid 6.659228e-06 
## ... Similar to previous best
## Run 15 stress 0.01822508 
## ... Procrustes: rmse 1.612603e-05  max resid 3.554646e-05 
## ... Similar to previous best
## Run 16 stress 0.0190068 
## Run 17 stress 0.01900659 
## Run 18 stress 0.04554822 
## Run 19 stress 0.01900675 
## Run 20 stress 0.01822507 
## ... New best solution
## ... Procrustes: rmse 3.223273e-06  max resid 5.459342e-06 
## ... Similar to previous best
## *** Best solution repeated 1 times
# Plot the PCoA
wUNI_gut_section_nmds <- 
  plot_ordination(
    physeq = scaled_rooted_physeq,
    ordination = scaled_wUNI_nmds,
    color = "gut_section") +
  geom_point(size=5, alpha = 0.5, aes(color = gut_section)) +
  scale_shape_manual(values = c(15,16)) +
  scale_color_manual(values = c(gut_section_colors)) + 
  theme_bw() + labs(color = "Gut Section")
wUNI_gut_section_nmds

We can see from above the plot that the stress value is ~0.02, which is very acceptable. And, It seems important to emphasize that the PCoA and the NMDS plot both look pretty similar!

In this case, I would always prefer to report the PCoA results because they are linear and provide a lot more post-hoc analyses to follow up with. In addition, it’s helpful to only have 2 axes of variation and show how much variation is explained.

(wUNI_gut_section_pcoa + theme(legend.position = "none")) + 
  (wUNI_gut_section_nmds + theme(legend.position = "none"))

Statistical Significance Testing

PERMANOVA

# Calculate all three of the distance matrices
scaled_sorensen_dist <- phyloseq::distance(scaled_rooted_physeq, method = "bray", binary = TRUE)
scaled_bray_dist <- phyloseq::distance(scaled_rooted_physeq, method = "bray")
scaled_wUnifrac_dist <- phyloseq::distance(scaled_rooted_physeq, method = "wunifrac")

# make a data frame from the sample_data
# All distance matrices will be the same metadata because they 
# originate from the same phyloseq object. 
metadata <- data.frame(sample_data(scaled_rooted_physeq))

# Adonis test
# In this example we are testing the hypothesis that the five stations
# that were collected have different centroids in the ordination space 
# for each of the dissimilarity metrics, we are using a discrete variable 
adonis2(scaled_sorensen_dist ~ gut_section, data = metadata)
## Permutation test for adonis under reduced model
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = scaled_sorensen_dist ~ gut_section, data = metadata)
##          Df SumOfSqs     R2      F Pr(>F)   
## Model     1  0.69653 0.2419 2.8718  0.004 **
## Residual  9  2.18283 0.7581                 
## Total    10  2.87936 1.0000                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
adonis2(scaled_bray_dist ~ gut_section, data = metadata)
## Permutation test for adonis under reduced model
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = scaled_bray_dist ~ gut_section, data = metadata)
##          Df SumOfSqs      R2      F Pr(>F)   
## Model     1  0.74126 0.26788 3.2931  0.002 **
## Residual  9  2.02588 0.73212                 
## Total    10  2.76714 1.00000                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
adonis2(scaled_wUnifrac_dist ~ gut_section, data = metadata)
## Permutation test for adonis under reduced model
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = scaled_wUnifrac_dist ~ gut_section, data = metadata)
##          Df SumOfSqs     R2      F Pr(>F)   
## Model     1  0.29380 0.5474 10.885  0.005 **
## Residual  9  0.24292 0.4526                 
## Total    10  0.53672 1.0000                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note that:

  • R2 = the percent variation explained.
  • F = the F-Statistic, which represents the importance value.
  • Pr(>F) = the pvalue

Above, we see that the most variation is explained by the weighted unifrac, which explains ~55% of the variation in the data and also has the highest F-statistic.

# We might also care about other variables
# Here, we will add date and fraction as variables
# multiplicative model ORDER MATTERS! 
adonis2(scaled_sorensen_dist ~ gut_section * year, data = metadata)
## Permutation test for adonis under reduced model
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = scaled_sorensen_dist ~ gut_section * year, data = metadata)
##          Df SumOfSqs      R2      F Pr(>F)  
## Model     3   1.1959 0.41533 1.6576  0.026 *
## Residual  7   1.6835 0.58467                
## Total    10   2.8794 1.00000                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
adonis2(scaled_bray_dist ~ gut_section * year, data = metadata)
## Permutation test for adonis under reduced model
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = scaled_bray_dist ~ gut_section * year, data = metadata)
##          Df SumOfSqs      R2      F Pr(>F)  
## Model     3   1.1664 0.42151 1.7002  0.011 *
## Residual  7   1.6007 0.57849                
## Total    10   2.7671 1.00000                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Note that the ORDER MATTERS!
adonis2(scaled_wUnifrac_dist ~ gut_section * year, data = metadata)
## Permutation test for adonis under reduced model
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = scaled_wUnifrac_dist ~ gut_section * year, data = metadata)
##          Df SumOfSqs      R2      F Pr(>F)   
## Model     3  0.32720 0.60963 3.6439  0.005 **
## Residual  7  0.20952 0.39037                 
## Total    10  0.53672 1.00000                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
adonis2(scaled_wUnifrac_dist ~ year * gut_section, data = metadata)
## Permutation test for adonis under reduced model
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = scaled_wUnifrac_dist ~ year * gut_section, data = metadata)
##          Df SumOfSqs      R2      F Pr(>F)   
## Model     3  0.32720 0.60963 3.6439  0.008 **
## Residual  7  0.20952 0.39037                 
## Total    10  0.53672 1.00000                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can also run tests that include additive (+) or multipliciatve models, which include the interaction term between variables.

BetaDispR

The PERMANOVA is sensitive to variance/dispersion in the data. Therefore, we need to run a homogeneity of dispersion test to test for the sensitivity of our PERMANOVA results to variance.

# Homogeneity of Disperson test with beta dispr
# Sorensen 
beta_soren_gut_section <- betadisper(scaled_sorensen_dist, metadata$gut_section)
permutest(beta_soren_gut_section)
## 
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 999
## 
## Response: Distances
##           Df   Sum Sq   Mean Sq      F N.Perm Pr(>F)
## Groups     1 0.008910 0.0089097 2.3074    999  0.199
## Residuals  9 0.034753 0.0038615
# Bray-curtis 
beta_bray_gut_section <- betadisper(scaled_bray_dist, metadata$gut_section)
permutest(beta_bray_gut_section)
## 
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 999
## 
## Response: Distances
##           Df   Sum Sq   Mean Sq      F N.Perm Pr(>F)
## Groups     1 0.000370 0.0003697 0.0611    999  0.823
## Residuals  9 0.054473 0.0060526
# Weighted Unifrac 
beta_bray_gut_section <- betadisper(scaled_wUnifrac_dist, metadata$gut_section)
permutest(beta_bray_gut_section)
## 
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 999
## 
## Response: Distances
##           Df   Sum Sq   Mean Sq      F N.Perm Pr(>F)
## Groups     1 0.008195 0.0081948 1.2269    999  0.305
## Residuals  9 0.060112 0.0066791

Above, our variance is not impacted by gut section.

Taxonomic Composition

Phylum

# Set the phylum colors
phylum_colors <- c(
  Acidobacteriota = "navy", 
  Actinobacteriota = "darkslategray2", 
  Armatimonadota = "deeppink1",
  Alphaproteobacteria = "plum2", 
  Bacteroidota = "gold", 
  Betaproteobacteria = "plum1", 
  Bdellovibrionota = "red1",
  Chloroflexi="black", 
  Crenarchaeota = "firebrick",
  Cyanobacteria = "limegreen",
  Deltaproteobacteria = "grey", 
  Desulfobacterota="magenta",
  Fibrobacterota = "darkgreen",
  Bacillota = "#3E9B96",
  Gammaproteobacteria = "greenyellow",
  "Marinimicrobia (SAR406 clade)" = "yellow",
  Myxococcota = "#B5D6AA",
  Nitrospirota = "palevioletred1",
  Pseudomonadota = "royalblue",
  Planctomycetota = "darkorange", 
  Spirochaetota = "olivedrab",
  Thermoplasmatota = "green",
  Verrucomicrobiota = "darkorchid1")

Plot Phylum Composition

# Goal is to calculate the phylum relative abundance
# Note: the read depth must be normalized in some way: scaled_reads
level_order <- 
  c('568_4','E1_4','E2_4','E3_4','568_5', '581_5','611_5', 'E05_5', 'E1_5',
    'E2_5', 'E3_5') 

phylum_df <-
  scaled_rooted_physeq %>%
  # agglomerate at the phylum level
  tax_glom(taxrank = "Phylum") %>%
  # Transform counts to relative abundance
  transform_sample_counts(function (x) {x/sum(x)}) %>%
  # Melt to a long format
  psmelt() %>%
  # Filter out phyla that are less than one percent - get rid of low abundant Phyla
  dplyr::filter(Abundance > 0.01)
  # fix the order of date
  
# Stacked bar plot with all Phyla
# Plot Phylum Abundances - make sure to load phylum_colors
phylum_df %>%
  # Warning: Its important to have one sample per x value,
  # Otherwise, it will take the sum between multiple samples
  ggplot(aes(x = factor(names, level = level_order),
             y = Abundance, fill = Phylum)) +
  facet_grid(.~gut_section) + 
  geom_bar(stat = "identity", color = "black") +
  labs(title = "Gut Section Phylum Composition") +
  scale_fill_manual(values = phylum_colors) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1))

# Plot Phylum Abundances - make sure to load phylum_colors
phylum_df %>%
  # Warning: Its important to have one sample per x value,
  # Otherwise, it will take the sum between multiple samples
  ggplot(aes(x = factor(names, level = level_order),
             y = Abundance, fill = Phylum)) +
  geom_bar(stat = "identity", color = "black") +
  labs(title = "Gut Section Phylum Composition") + 
  scale_fill_manual(values = phylum_colors) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1)) +
  labs(y ="Relative Abundance",x = "Sample")

# Make each Phyla its own row
#fixed scale
phylum_df %>%
  ggplot(aes(x = names, y = Abundance, fill = Phylum)) +
  facet_grid(Phylum~gut_section) + 
  geom_bar(stat = "identity", color = "black") +
  labs(title = "Gut Section Phylum Composition") + 
  scale_fill_manual(values = phylum_colors) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1))

# free scale
phylum_df %>%
  ggplot(aes(x = names, y = Abundance, fill = Phylum)) +
  facet_grid(Phylum~gut_section, scale = "free") + 
  geom_bar(stat = "identity", color = "black") +
  labs(title = "Gut Section Phylum Composition") + 
  scale_fill_manual(values = phylum_colors) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1))

# Narrow in on a specific group

# Bacteroidota - y: abundance, x: gut section, dot plot + boxplot
Bacteroidota_phylum <-
  phylum_df %>%
  dplyr::filter(Phylum == "Bacteroidota") %>%
  # build the plot
  ggplot(aes(x = gut_section, y = Abundance, fill = gut_section, color = gut_section)) + 
  geom_boxplot(alpha = 0.5, outlier.shape = NA) +
  # outliers not plotted here in boxplot
  geom_jitter() +
  theme_bw() +
  labs(title = "Bacteroidota Phylum Abundance") +
  scale_color_manual(values = gut_section_colors) +
  scale_fill_manual(values = gut_section_colors) +
  stat_compare_means(paired = FALSE)
Bacteroidota_phylum

# Cyanobacteria - y: abundance, x: gut section, dot plot + boxplot
Cyanobacteria_phylum <-
  phylum_df %>%
  dplyr::filter(Phylum == "Cyanobacteria") %>%
  # build the plot
  ggplot(aes(x = gut_section, y = Abundance, fill = gut_section, color = gut_section)) + 
  geom_boxplot(alpha = 0.5, outlier.shape = NA) +
  # outliers not plotted here in boxplot
  geom_jitter() +
  theme_bw() +
  labs(title = "Cyanobacteria Phylum Abundance") +
  scale_color_manual(values = gut_section_colors) +
  scale_fill_manual(values = gut_section_colors) +
  stat_compare_means(paired = FALSE)
Cyanobacteria_phylum
## Warning: Computation failed in `stat_compare_means()`.
## Caused by error:
## ! argument "x" is missing, with no default

# Desulfobacterota - y: abundance, x: gut section, dot plot + boxplot
Desulfobacterota_phylum <-
  phylum_df %>%
  dplyr::filter(Phylum == "Desulfobacterota") %>%
  # build the plot
  ggplot(aes(x = gut_section, y = Abundance, fill = gut_section, color = gut_section)) + 
  geom_boxplot(alpha = 0.5, outlier.shape = NA) +
  # outliers not plotted here in boxplot
  geom_jitter() +
  theme_bw() +
  labs(title = "Desulfobacterota Phylum Abundance") +
  scale_color_manual(values = gut_section_colors) +
  scale_fill_manual(values = gut_section_colors) +
  stat_compare_means()
Desulfobacterota_phylum

# Fibrobacterota - y: abundance, x: gut section, dot plot + boxplot
Fibrobacterota_phylum <-
  phylum_df %>%
  dplyr::filter(Phylum == "Fibrobacterota") %>%
  # build the plot
  ggplot(aes(x = gut_section, y = Abundance, fill = gut_section, color = gut_section)) + 
  geom_boxplot(alpha = 0.5, outlier.shape = NA) +
  # outliers not plotted here in boxplot
  geom_jitter() +
  theme_bw() +
  labs(title = "Fibrobacterota Phylum Abundance") +
  scale_color_manual(values = gut_section_colors) +
  scale_fill_manual(values = gut_section_colors) +
  stat_compare_means()
Fibrobacterota_phylum
## Warning: Computation failed in `stat_compare_means()`.
## Caused by error:
## ! argument "x" is missing, with no default

# Firmicutes - y: abundance, x: gut section, dot plot + boxplot
Bacillota_phylum <-
  phylum_df %>%
  dplyr::filter(Phylum == "Bacillota") %>%
  # build the plot
  ggplot(aes(x = gut_section, y = Abundance, fill = gut_section, color = gut_section)) + 
  geom_boxplot(alpha = 0.5, outlier.shape = NA) +
  # outliers not plotted here in boxplot
  geom_jitter() +
  theme_bw() +
  labs(title = "Bacillota Phylum Abundance") +
  scale_color_manual(values = gut_section_colors) +
  scale_fill_manual(values = gut_section_colors) +
  stat_compare_means()
Bacillota_phylum

# Proteobacteria - y: abundance, x: gut section, dot plot + boxplot
Pseudomonadota_phylum <-
  phylum_df %>%
  dplyr::filter(Phylum == "Pseudomonadota") %>%
  # build the plot
  ggplot(aes(x = gut_section, y = Abundance, fill = gut_section, color = gut_section)) + 
  geom_boxplot(alpha = 0.5, outlier.shape = NA) +
  # outliers not plotted here in boxplot
  geom_jitter() +
  theme_bw() +
  labs(title = "Pseudomonadota Phylum Abundance") +
  scale_color_manual(values = gut_section_colors) +
  scale_fill_manual(values = gut_section_colors) +
  stat_compare_means()
Pseudomonadota_phylum

# Spirochaetota- y: abundance, x: gut section, dot plot + boxplot
Spirochaetota_phylum <-
  phylum_df %>%
  dplyr::filter(Phylum == "Spirochaetota") %>%
  # build the plot
  ggplot(aes(x = gut_section, y = Abundance, fill = gut_section, color = gut_section)) + 
  geom_boxplot(alpha = 0.5, outlier.shape = NA) +
  # outliers not plotted here in boxplot
  geom_jitter() +
  theme_bw() +
  labs(title = "Spirochaetota Phylum Abundance") +
  scale_color_manual(values = gut_section_colors) +
  scale_fill_manual(values = gut_section_colors)+
  stat_compare_means()
Spirochaetota_phylum
## Warning: Computation failed in `stat_compare_means()`.
## Caused by error:
## ! argument "x" is missing, with no default

# Verrucomicrobiota - y: abundance, x: gut section, dot plot + boxplot
Verrucomicrobiota_phylum <-
  phylum_df %>%
  dplyr::filter(Phylum == "Verrucomicrobiota") %>%
  # build the plot
  ggplot(aes(x = gut_section, y = Abundance, fill = gut_section, color = gut_section)) + 
  geom_boxplot(alpha = 0.5, outlier.shape = NA) +
  # outliers not plotted here in boxplot
  geom_jitter() +
  theme_bw() +
  labs(title = "Verrucomicrobiota Phylum Abundance") +
  scale_color_manual(values = gut_section_colors) +
  scale_fill_manual(values = gut_section_colors) +
  stat_compare_means()
Verrucomicrobiota_phylum

Order

# Set Order colors
order_colors <- c(
  Bacteroidales = "darkslategray2", 
  Christensenellales = "gold",
  "Clostridia vadinBB60 group" = "deeppink1",
  Clostridiales = "limegreen",  
  Cytophagales = "gray",
  Desulfobacterales = "magenta4", 
  Desulfovibrionales = "black",
  Enterobacterales = "firebrick",
  Fibrobacterales = "yellow",
  Gastranaerophilales = "greenyellow",
  Lachnospirales = "yellow4", 
  Opitutales ="magenta",
  Oscillospirales = "palevioletred1",
  "P.palmC41"  = "orange3",
  "Peptostreptococcales-Tissierellales" = "lightgreen",
  Pseudomonadales = "purple2",
  Spirochaetales = "royalblue",
  Verrucomicrobiales  = "plum2",
  Victivallales = "darkgray",
  "WCHB1-41"=  "violetred"
)
Order_df <-
  scaled_rooted_physeq %>%
  # agglomerate at the phylum level
  tax_glom(taxrank = "Order") %>%
  # Transform counts to relative abundance
  transform_sample_counts(function (x) {x/sum(x)}) %>%
  # Melt to a long format
  psmelt() %>%
  # Filter out phyla that are less than one percent - get rid of low abundant Phyla
  dplyr::filter(Abundance > 0.01)

#section_IV <-
#  c('568_4','E1_4','E2_4','E3_4')

#section_IV_df <- filter(Order_df, names == c('568_4','E1_4','E2_4','E3_4') )

#section_V <-
#  c('568_5', '581_5','611_5', 'E05_5', 'E1_5', 'E2_5', 'E3_5')

#section_V_df <- filter(Order_df, names == section_V )

#save(Order_df, file = "data/05_community_analysis/ASV_counts_order.RData")

#write.csv2(Order_df, file = "data/05_community_analysis/ASV_counts_order.csv")

#write.table(Order_df, "data/05_community_analysis/ASV_counts_order.tsv",
#            sep = "\t", quote = FALSE, col.names = NA)

# Facet by gut section
Order_df %>%
  # Warning: Its important to have one sample per x value,
  # Otherwise, it will take the sum between multiple samples
  ggplot(aes(x = names, y = Abundance, fill = Order)) +
  facet_grid(.~gut_section) + 
  geom_bar(stat = "identity", color = "black") +
  #scale_fill_manual(values = order_colors) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1))

level_order <- 
  c('568_4','E1_4','E2_4','E3_4','568_5', '581_5','611_5', 'E05_5', 'E1_5', 'E2_5', 'E3_5') 

Order_df %>%
  # Warning: Its important to have one sample per x value,
  # Otherwise, it will take the sum between multiple samples
  ggplot(aes(x = factor(names, level = level_order), 
             y = Abundance, fill = Order)) +
  #facet_grid(.~gut_section) + 
  geom_bar(stat = "identity", color = "black") +
  scale_fill_manual(values = order_colors) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1)) +
  labs(y ="Relative Abundance",x = "Sample")

# Section IV

section_IV_df <- read.table("data/05_community_analysis/section_IV.tsv", sep = ",", skip = 1)

head(section_IV_df)
##   V1     V2    V3         V4    V5                      V6 V7
## 1  1  ASV_4 568_4 0.61878762 568_4 Pomacanthus sexstriatus IV
## 2 17  ASV_1 568_4 0.27061459 568_4 Pomacanthus sexstriatus IV
## 3 65 ASV_14 568_4 0.02841943 568_4 Pomacanthus sexstriatus IV
## 4 72  ASV_2 568_4 0.02379952 568_4 Pomacanthus sexstriatus IV
## 5 78 ASV_35 568_4 0.02183956 568_4 Pomacanthus sexstriatus IV
## 6 79  ASV_9 568_4 0.02183956 568_4 Pomacanthus sexstriatus IV
##                   V8                 V9  V10 V11 V12  V13   V14 V15 V16   V17
## 1 Great Barrier Reef South Island South 2014  12  12 S246 15:30 236 285 654.3
## 2 Great Barrier Reef South Island South 2014  12  12 S246 15:30 236 285 654.3
## 3 Great Barrier Reef South Island South 2014  12  12 S246 15:30 236 285 654.3
## 4 Great Barrier Reef South Island South 2014  12  12 S246 15:30 236 285 654.3
## 5 Great Barrier Reef South Island South 2014  12  12 S246 15:30 236 285 654.3
## 6 Great Barrier Reef South Island South 2014  12  12 S246 15:30 236 285 654.3
##    V18 V19    V20   V21   V22   V23   V24   V25   V26      V27      V28
## 1 52.5   M Sample 52502 25983 25713 25724 24787 24522 46.70679 Bacteria
## 2 52.5   M Sample 52502 25983 25713 25724 24787 24522 46.70679 Bacteria
## 3 52.5   M Sample 52502 25983 25713 25724 24787 24522 46.70679 Bacteria
## 4 52.5   M Sample 52502 25983 25713 25724 24787 24522 46.70679 Bacteria
## 5 52.5   M Sample 52502 25983 25713 25724 24787 24522 46.70679 Bacteria
## 6 52.5   M Sample 52502 25983 25713 25724 24787 24522 46.70679 Bacteria
##              V29                 V30                                 V31
## 1 Pseudomonadota GammaPseudomonadota                     Pseudomonadales
## 2   Bacteroidota         Bacteroidia                        Cytophagales
## 3 Pseudomonadota GammaPseudomonadota                    Enterobacterales
## 4      Bacillota          Clostridia Peptostreptococcales-Tissierellales
## 5      Bacillota          Clostridia                     Oscillospirales
## 6      Bacillota          Clostridia                      Lachnospirales
colnames(section_IV_df) <- c("", "OTU", "Sample", "Abundance", "names", 
                             "host_species", "gut_section", "region","loction", "year",
                             "month", "day", "sample_lab", "Time", "SL.mm.", "FL.mm.",
                             "TW..g.", "GW..g.", "Sex", "Sample_or_Control", 
                             "input", "filtered", "denoisedF", "denoisedR", 
                             "merged", "nochim", "perc_reads_retained", "Kingdom",
                             "Phylum", "Class", "Order")
head(section_IV_df)
##         OTU Sample  Abundance names            host_species gut_section
## 1  1  ASV_4  568_4 0.61878762 568_4 Pomacanthus sexstriatus          IV
## 2 17  ASV_1  568_4 0.27061459 568_4 Pomacanthus sexstriatus          IV
## 3 65 ASV_14  568_4 0.02841943 568_4 Pomacanthus sexstriatus          IV
## 4 72  ASV_2  568_4 0.02379952 568_4 Pomacanthus sexstriatus          IV
## 5 78 ASV_35  568_4 0.02183956 568_4 Pomacanthus sexstriatus          IV
## 6 79  ASV_9  568_4 0.02183956 568_4 Pomacanthus sexstriatus          IV
##               region            loction year month day sample_lab  Time SL.mm.
## 1 Great Barrier Reef South Island South 2014    12  12       S246 15:30    236
## 2 Great Barrier Reef South Island South 2014    12  12       S246 15:30    236
## 3 Great Barrier Reef South Island South 2014    12  12       S246 15:30    236
## 4 Great Barrier Reef South Island South 2014    12  12       S246 15:30    236
## 5 Great Barrier Reef South Island South 2014    12  12       S246 15:30    236
## 6 Great Barrier Reef South Island South 2014    12  12       S246 15:30    236
##   FL.mm. TW..g. GW..g. Sex Sample_or_Control input filtered denoisedF denoisedR
## 1    285  654.3   52.5   M            Sample 52502    25983     25713     25724
## 2    285  654.3   52.5   M            Sample 52502    25983     25713     25724
## 3    285  654.3   52.5   M            Sample 52502    25983     25713     25724
## 4    285  654.3   52.5   M            Sample 52502    25983     25713     25724
## 5    285  654.3   52.5   M            Sample 52502    25983     25713     25724
## 6    285  654.3   52.5   M            Sample 52502    25983     25713     25724
##   merged nochim perc_reads_retained  Kingdom         Phylum               Class
## 1  24787  24522            46.70679 Bacteria Pseudomonadota GammaPseudomonadota
## 2  24787  24522            46.70679 Bacteria   Bacteroidota         Bacteroidia
## 3  24787  24522            46.70679 Bacteria Pseudomonadota GammaPseudomonadota
## 4  24787  24522            46.70679 Bacteria      Bacillota          Clostridia
## 5  24787  24522            46.70679 Bacteria      Bacillota          Clostridia
## 6  24787  24522            46.70679 Bacteria      Bacillota          Clostridia
##                                 Order
## 1                     Pseudomonadales
## 2                        Cytophagales
## 3                    Enterobacterales
## 4 Peptostreptococcales-Tissierellales
## 5                     Oscillospirales
## 6                      Lachnospirales
section_IV_df %>%
  ggplot(aes(x = names, 
             y = Abundance, fill = Order)) +
  geom_bar(stat = "identity", color = "black") +
  scale_fill_manual(values = order_colors) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1)) +
  labs(y ="Relative Abundance",x = "Sample")

# Section V

section_V_df <- read.table("data/05_community_analysis/section_V.tsv", sep = ",", skip = 1)

head(section_V_df)
##   V1     V2    V3         V4    V5                      V6 V7
## 1 12  ASV_4 568_5 0.31185531 568_5 Pomacanthus sexstriatus  V
## 2 18 ASV_16 568_5 0.26875795 568_5 Pomacanthus sexstriatus  V
## 3 40 ASV_35 568_5 0.05553201 568_5 Pomacanthus sexstriatus  V
## 4 41 ASV_28 568_5 0.05355377 568_5 Pomacanthus sexstriatus  V
## 5 42 ASV_12 568_5 0.05143422 568_5 Pomacanthus sexstriatus  V
## 6 58  ASV_1 568_5 0.03490179 568_5 Pomacanthus sexstriatus  V
##                   V8                                  V9  V10 V11 V12  V13
## 1 Great Barrier Reef Between Bird Islet and South Island 2014  12  12 S346
## 2 Great Barrier Reef Between Bird Islet and South Island 2014  12  12 S346
## 3 Great Barrier Reef Between Bird Islet and South Island 2014  12  12 S346
## 4 Great Barrier Reef Between Bird Islet and South Island 2014  12  12 S346
## 5 Great Barrier Reef Between Bird Islet and South Island 2014  12  12 S346
## 6 Great Barrier Reef Between Bird Islet and South Island 2014  12  12 S346
##     V14 V15 V16   V17  V18 V19    V20   V21   V22   V23   V24   V25   V26
## 1 15:30 236 285 654.3 52.5   M Sample 38610 18739 18165 18254 17389 17305
## 2 15:30 236 285 654.3 52.5   M Sample 38610 18739 18165 18254 17389 17305
## 3 15:30 236 285 654.3 52.5   M Sample 38610 18739 18165 18254 17389 17305
## 4 15:30 236 285 654.3 52.5   M Sample 38610 18739 18165 18254 17389 17305
## 5 15:30 236 285 654.3 52.5   M Sample 38610 18739 18165 18254 17389 17305
## 6 15:30 236 285 654.3 52.5   M Sample 38610 18739 18165 18254 17389 17305
##        V27      V28              V29                 V30
## 1 44.81999 Bacteria   Pseudomonadota GammaPseudomonadota
## 2 44.81999 Bacteria     Bacteroidota         Bacteroidia
## 3 44.81999 Bacteria        Bacillota          Clostridia
## 4 44.81999 Bacteria        Bacillota          Clostridia
## 5 44.81999 Bacteria Desulfobacterota    Desulfovibrionia
## 6 44.81999 Bacteria     Bacteroidota         Bacteroidia
##                          V31
## 1            Pseudomonadales
## 2              Bacteroidales
## 3            Oscillospirales
## 4 Clostridia vadinBB60 group
## 5         Desulfovibrionales
## 6               Cytophagales
colnames(section_V_df) <- c("", "OTU", "Sample", "Abundance", "names", 
                             "host_species", "gut_section", "region","loction", "year",
                             "month", "day", "sample_lab", "Time", "SL.mm.", "FL.mm.",
                             "TW..g.", "GW..g.", "Sex", "Sample_or_Control", 
                             "input", "filtered", "denoisedF", "denoisedR", 
                             "merged", "nochim", "perc_reads_retained", "Kingdom",
                             "Phylum", "Class", "Order")
head(section_V_df)
##         OTU Sample  Abundance names            host_species gut_section
## 1 12  ASV_4  568_5 0.31185531 568_5 Pomacanthus sexstriatus           V
## 2 18 ASV_16  568_5 0.26875795 568_5 Pomacanthus sexstriatus           V
## 3 40 ASV_35  568_5 0.05553201 568_5 Pomacanthus sexstriatus           V
## 4 41 ASV_28  568_5 0.05355377 568_5 Pomacanthus sexstriatus           V
## 5 42 ASV_12  568_5 0.05143422 568_5 Pomacanthus sexstriatus           V
## 6 58  ASV_1  568_5 0.03490179 568_5 Pomacanthus sexstriatus           V
##               region                             loction year month day
## 1 Great Barrier Reef Between Bird Islet and South Island 2014    12  12
## 2 Great Barrier Reef Between Bird Islet and South Island 2014    12  12
## 3 Great Barrier Reef Between Bird Islet and South Island 2014    12  12
## 4 Great Barrier Reef Between Bird Islet and South Island 2014    12  12
## 5 Great Barrier Reef Between Bird Islet and South Island 2014    12  12
## 6 Great Barrier Reef Between Bird Islet and South Island 2014    12  12
##   sample_lab  Time SL.mm. FL.mm. TW..g. GW..g. Sex Sample_or_Control input
## 1       S346 15:30    236    285  654.3   52.5   M            Sample 38610
## 2       S346 15:30    236    285  654.3   52.5   M            Sample 38610
## 3       S346 15:30    236    285  654.3   52.5   M            Sample 38610
## 4       S346 15:30    236    285  654.3   52.5   M            Sample 38610
## 5       S346 15:30    236    285  654.3   52.5   M            Sample 38610
## 6       S346 15:30    236    285  654.3   52.5   M            Sample 38610
##   filtered denoisedF denoisedR merged nochim perc_reads_retained  Kingdom
## 1    18739     18165     18254  17389  17305            44.81999 Bacteria
## 2    18739     18165     18254  17389  17305            44.81999 Bacteria
## 3    18739     18165     18254  17389  17305            44.81999 Bacteria
## 4    18739     18165     18254  17389  17305            44.81999 Bacteria
## 5    18739     18165     18254  17389  17305            44.81999 Bacteria
## 6    18739     18165     18254  17389  17305            44.81999 Bacteria
##             Phylum               Class                      Order
## 1   Pseudomonadota GammaPseudomonadota            Pseudomonadales
## 2     Bacteroidota         Bacteroidia              Bacteroidales
## 3        Bacillota          Clostridia            Oscillospirales
## 4        Bacillota          Clostridia Clostridia vadinBB60 group
## 5 Desulfobacterota    Desulfovibrionia         Desulfovibrionales
## 6     Bacteroidota         Bacteroidia               Cytophagales
section_V_df %>%
  ggplot(aes(x = names, 
             y = Abundance, fill = Order)) +
  geom_bar(stat = "identity", color = "black") +
  scale_fill_manual(values = order_colors) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1)) +
  labs(y ="Relative Abundance",x = "Sample")

Genus

# Set the phylum colors
Genus_colors <- c(
  Akkermansia = "navy",  
  "Christensenellaceae R-7 group" = "greenyellow", 
  Alistipes = "deeppink1",
  Aureibacter = "plum2", 
  Endozoicomonas = "gold", 
   Desulfobotulus= "magenta4", 
  Fibrobacter = "red1",
  Desulfovibrio="black", 
  "dgA-11 gut group" = "firebrick",
  Epulopiscium = "limegreen",
  Ferrimonas = "grey", 
  Odoribacter ="magenta",
  Rikenella = "darkslategray2",
  "Rikenellaceae RC9 gut group" = "yellow",
  Ruminococcus = "#B5D6AA",
  Romboutsia = "palevioletred1",
  Treponema = "royalblue",
  Vibrio = "darkolivegreen1",
  Other = "darkgray")
Genus_df <-
  scaled_rooted_physeq %>%
  # agglomerate at the phylum level
  tax_glom(taxrank = "Genus") %>%
  # Transform counts to relative abundance
  transform_sample_counts(function (x) {x/sum(x)}) %>%
  # Melt to a long format
  psmelt() %>%
  # Filter out phyla that are less than one percent - get rid of low abundant Phyla
  dplyr::filter(Abundance > 0.01)
  
# Facet by gut section
Genus_df %>%
  # Warning: Its important to have one sample per x value,
  # Otherwise, it will take the sum between multiple samples
  ggplot(aes(x = names, y = Abundance, fill = Genus)) +
  facet_grid(.~gut_section) + 
  geom_bar(stat = "identity", color = "black") +
  scale_fill_manual(values = Genus_colors) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1))

level_order <- 
  c('568_4','E1_4','E2_4','E3_4','568_5', '581_5','611_5', 'E05_5', 
    'E1_5', 'E2_5', 'E3_5') 

Genus_df %>%
  # Warning: Its important to have one sample per x value,
  # Otherwise, it will take the sum between multiple samples
  ggplot(aes(x = factor(names, level = level_order), 
             y = Abundance, fill = Genus)) +
  #facet_grid(.~gut_section) + 
  geom_bar(stat = "identity", color = "black") +
  scale_fill_manual(values = Genus_colors) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1)) +
  labs(y ="Relative Abundance", x = "Sample")

# Bacteroidota
Genus_df %>%
  dplyr::filter(Phylum == "Bacteroidota") %>%
  # build the plot
  ggplot(aes(x = gut_section, y = Abundance,
             fill = gut_section, color = gut_section)) + 
  facet_wrap(.~Genus, scales = "free_y", nrow = 1) +
  geom_boxplot(alpha = 0.5, outlier.shape = NA) +
  # outliers not plotted here in boxplot
  geom_jitter() +
  theme_bw() +
  labs(title = "Bacteroidota genus Abundance") +
  scale_color_manual(values = gut_section_colors) +
  scale_fill_manual(values = gut_section_colors) +
   stat_compare_means()
## Warning: Computation failed in `stat_compare_means()`.
## Computation failed in `stat_compare_means()`.
## Computation failed in `stat_compare_means()`.
## Computation failed in `stat_compare_means()`.
## Caused by error:
## ! argument "x" is missing, with no default

# Bacillota
Genus_df %>%
  dplyr::filter(Phylum == "Bacillota") %>%
  # build the plot
  ggplot(aes(x = gut_section, y = Abundance,
             fill = gut_section, color = gut_section)) + 
  facet_wrap(.~Genus, scales = "free_y", nrow = 1) +
  geom_boxplot(alpha = 0.5, outlier.shape = NA) +
  # outliers not plotted here in boxplot
  geom_jitter() +
  theme_bw() +
  labs(title = "Bacillota Genus Abundance") +
  scale_color_manual(values = gut_section_colors) +
  scale_fill_manual(values = gut_section_colors) 

# Pseudomonadota
Genus_df %>%
  dplyr::filter(Phylum == "Pseudomonadota") %>%
  # build the plot
  ggplot(aes(x = gut_section, y = Abundance,
             fill = gut_section, color = gut_section)) + 
  facet_wrap(.~Genus, scales = "free_y", nrow = 1) +
  geom_boxplot(alpha = 0.5, outlier.shape = NA) +
  # outliers not plotted here in boxplot
  geom_jitter() +
  theme_bw() +
  labs(title = "Pseudomonadota Genus Abundance") +
  scale_color_manual(values = gut_section_colors) +
  scale_fill_manual(values = gut_section_colors)

# Akkermansia
Genus_df %>%
  dplyr::filter(Genus == "Akkermansia") %>%
  # build the plot
  ggplot(aes(x = gut_section, y = Abundance,
             fill = gut_section, color = gut_section)) + 
  facet_wrap(.~Genus, scales = "free_y", nrow = 1) +
  geom_boxplot(alpha = 0.5, outlier.shape = NA) +
  # outliers not plotted here in boxplot
  geom_jitter() +
  theme_bw() +
  labs(title = "Akkermansia Genus Abundance") +
  scale_color_manual(values = gut_section_colors) +
  scale_fill_manual(values = gut_section_colors) #+

  #stat_compare_means(paired = FALSE)


# Desulfovibrio
Genus_df %>%
  dplyr::filter(Genus == "Desulfovibrio") %>%
  # build the plot
  ggplot(aes(x = gut_section, y = Abundance,
             fill = gut_section, color = gut_section)) + 
  facet_wrap(.~Genus, scales = "free_y", nrow = 1) +
  geom_boxplot(alpha = 0.5, outlier.shape = NA) +
  # outliers not plotted here in boxplot
  geom_jitter() +
  theme_bw() +
  labs(title = "Desulfovibrio Genus Abundance") +
  scale_color_manual(values = gut_section_colors) +
  scale_fill_manual(values = gut_section_colors) +
  stat_compare_means()

# dg A -11 gut group
Genus_df %>%
  dplyr::filter(Genus == "dgA-11 gut group") %>%
  # build the plot
  ggplot(aes(x = gut_section, y = Abundance,
             fill = gut_section, color = gut_section)) + 
  facet_wrap(.~Genus, scales = "free_y", nrow = 1) +
  geom_boxplot(alpha = 0.5, outlier.shape = NA) +
  # outliers not plotted here in boxplot
  geom_jitter() +
  theme_bw() +
  labs(title = "dgA- 11 gut group Genus Abundance") +
  scale_color_manual(values = gut_section_colors) +
  scale_fill_manual(values = gut_section_colors) 

#Fibrobacter
Genus_df %>%
  dplyr::filter(Genus == "Fibrobacter") %>%
  # build the plot
  ggplot(aes(x = gut_section, y = Abundance,
             fill = gut_section, color = gut_section)) + 
  facet_wrap(.~Genus, scales = "free_y", nrow = 1) +
  geom_boxplot(alpha = 0.5, outlier.shape = NA) +
  # outliers not plotted here in boxplot
  geom_jitter() +
  theme_bw() +
  labs(title = "Fibrobacter Genus Abundance") +
  scale_color_manual(values = gut_section_colors) +
  scale_fill_manual(values = gut_section_colors) 

#Treponema
Genus_df %>%
  dplyr::filter(Genus == "Treponema") %>%
  # build the plot
  ggplot(aes(x = gut_section, y = Abundance,
             fill = gut_section, color = gut_section)) + 
  facet_wrap(.~Genus, scales = "free_y", nrow = 1) +
  geom_boxplot(alpha = 0.5, outlier.shape = NA) +
  # outliers not plotted here in boxplot
  geom_jitter() +
  theme_bw() +
  labs(title = "Treponema Genus Abundance") +
  scale_color_manual(values = gut_section_colors) +
  scale_fill_manual(values = gut_section_colors) 

Session Information

For Reproducibility

#Ensure reproducibility
devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.4.2 (2024-10-31)
##  os       macOS Sequoia 15.6
##  system   x86_64, darwin20
##  ui       X11
##  language (EN)
##  collate  en_US.UTF-8
##  ctype    en_US.UTF-8
##  tz       America/New_York
##  date     2025-08-11
##  pandoc   3.4 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/x86_64/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package          * version   date (UTC) lib source
##  abind              1.4-8     2024-09-12 [1] CRAN (R 4.4.1)
##  ade4               1.7-22    2023-02-06 [1] CRAN (R 4.4.0)
##  ape                5.8-1     2024-12-16 [1] CRAN (R 4.4.1)
##  backports          1.5.0     2024-05-23 [1] CRAN (R 4.4.0)
##  Biobase            2.62.0    2023-10-24 [1] Bioconductor
##  BiocGenerics       0.52.0    2024-10-29 [1] Bioconductor 3.20 (R 4.4.1)
##  biomformat         1.34.0    2024-10-29 [1] Bioconductor 3.20 (R 4.4.1)
##  Biostrings         2.74.1    2024-12-16 [1] Bioconductor 3.20 (R 4.4.2)
##  bitops             1.0-9     2024-10-03 [1] CRAN (R 4.4.1)
##  broom              1.0.7     2024-09-26 [1] CRAN (R 4.4.1)
##  bslib              0.8.0     2024-07-29 [1] CRAN (R 4.4.0)
##  cachem             1.1.0     2024-05-16 [1] CRAN (R 4.4.0)
##  car                3.1-3     2024-09-27 [1] CRAN (R 4.4.1)
##  carData            3.0-5     2022-01-06 [1] CRAN (R 4.4.0)
##  cli                3.6.3     2024-06-21 [1] CRAN (R 4.4.0)
##  cluster            2.1.6     2023-12-01 [2] CRAN (R 4.4.2)
##  codetools          0.2-20    2024-03-31 [2] CRAN (R 4.4.2)
##  colorspace         2.1-1     2024-07-26 [1] CRAN (R 4.4.0)
##  crayon             1.5.3     2024-06-20 [1] CRAN (R 4.4.0)
##  data.table         1.16.4    2024-12-06 [1] CRAN (R 4.4.1)
##  devtools         * 2.4.5     2022-10-11 [1] CRAN (R 4.4.0)
##  digest             0.6.37    2024-08-19 [1] CRAN (R 4.4.1)
##  dplyr            * 1.1.4     2023-11-17 [1] CRAN (R 4.4.0)
##  ellipsis           0.3.2     2021-04-29 [1] CRAN (R 4.4.0)
##  evaluate           1.0.1     2024-10-10 [1] CRAN (R 4.4.1)
##  farver             2.1.2     2024-05-13 [1] CRAN (R 4.4.0)
##  fastmap            1.2.0     2024-05-15 [1] CRAN (R 4.4.0)
##  forcats          * 1.0.0     2023-01-29 [1] CRAN (R 4.4.0)
##  foreach            1.5.2     2022-02-02 [1] CRAN (R 4.4.0)
##  Formula            1.2-5     2023-02-24 [1] CRAN (R 4.4.0)
##  fs                 1.6.5     2024-10-30 [1] CRAN (R 4.4.1)
##  generics           0.1.3     2022-07-05 [1] CRAN (R 4.4.0)
##  GenomeInfoDb       1.38.8    2024-03-15 [1] Bioconductor 3.18 (R 4.4.2)
##  GenomeInfoDbData   1.2.11    2025-01-09 [1] Bioconductor
##  ggplot2          * 3.5.1     2024-04-23 [1] CRAN (R 4.4.0)
##  ggpubr           * 0.6.0     2023-02-10 [1] CRAN (R 4.4.0)
##  ggsignif           0.6.4     2022-10-13 [1] CRAN (R 4.4.0)
##  glue               1.8.0     2024-09-30 [1] CRAN (R 4.4.1)
##  gtable             0.3.6     2024-10-25 [1] CRAN (R 4.4.1)
##  hms                1.1.3     2023-03-21 [1] CRAN (R 4.4.0)
##  htmltools          0.5.8.1   2024-04-04 [1] CRAN (R 4.4.0)
##  htmlwidgets        1.6.4     2023-12-06 [1] CRAN (R 4.4.0)
##  httpuv             1.6.15    2024-03-26 [1] CRAN (R 4.4.0)
##  igraph             2.1.2     2024-12-07 [1] CRAN (R 4.4.1)
##  IRanges            2.40.1    2024-12-05 [1] Bioconductor 3.20 (R 4.4.2)
##  iterators          1.0.14    2022-02-05 [1] CRAN (R 4.4.0)
##  jquerylib          0.1.4     2021-04-26 [1] CRAN (R 4.4.0)
##  jsonlite           1.8.9     2024-09-20 [1] CRAN (R 4.4.1)
##  knitr              1.49      2024-11-08 [1] CRAN (R 4.4.1)
##  labeling           0.4.3     2023-08-29 [1] CRAN (R 4.4.0)
##  later              1.4.1     2024-11-27 [1] CRAN (R 4.4.1)
##  lattice          * 0.22-6    2024-03-20 [2] CRAN (R 4.4.2)
##  lifecycle          1.0.4     2023-11-07 [1] CRAN (R 4.4.0)
##  lubridate        * 1.9.4     2024-12-08 [1] CRAN (R 4.4.1)
##  magrittr           2.0.3     2022-03-30 [1] CRAN (R 4.4.0)
##  MASS               7.3-61    2024-06-13 [2] CRAN (R 4.4.2)
##  Matrix             1.7-1     2024-10-18 [2] CRAN (R 4.4.2)
##  memoise            2.0.1     2021-11-26 [1] CRAN (R 4.4.0)
##  mgcv               1.9-1     2023-12-21 [2] CRAN (R 4.4.2)
##  mime               0.12      2021-09-28 [1] CRAN (R 4.4.0)
##  miniUI             0.1.1.1   2018-05-18 [1] CRAN (R 4.4.0)
##  multtest           2.58.0    2023-10-24 [1] Bioconductor
##  munsell            0.5.1     2024-04-01 [1] CRAN (R 4.4.0)
##  nlme               3.1-166   2024-08-14 [2] CRAN (R 4.4.2)
##  pacman             0.5.1     2019-03-11 [1] CRAN (R 4.4.0)
##  patchwork        * 1.3.0     2024-09-16 [1] CRAN (R 4.4.1)
##  permute          * 0.9-7     2022-01-27 [1] CRAN (R 4.4.0)
##  phyloseq         * 1.50.0    2024-10-29 [1] Bioconductor 3.20 (R 4.4.1)
##  pillar             1.10.1    2025-01-07 [1] CRAN (R 4.4.1)
##  pkgbuild           1.4.5     2024-10-28 [1] CRAN (R 4.4.1)
##  pkgconfig          2.0.3     2019-09-22 [1] CRAN (R 4.4.0)
##  pkgload            1.4.0     2024-06-28 [1] CRAN (R 4.4.0)
##  plyr               1.8.9     2023-10-02 [1] CRAN (R 4.4.0)
##  profvis            0.4.0     2024-09-20 [1] CRAN (R 4.4.1)
##  promises           1.3.2     2024-11-28 [1] CRAN (R 4.4.1)
##  purrr            * 1.0.2     2023-08-10 [1] CRAN (R 4.4.0)
##  R6                 2.5.1     2021-08-19 [1] CRAN (R 4.4.0)
##  Rcpp               1.0.13-1  2024-11-02 [1] CRAN (R 4.4.1)
##  RCurl              1.98-1.16 2024-07-11 [1] CRAN (R 4.4.0)
##  readr            * 2.1.5     2024-01-10 [1] CRAN (R 4.4.0)
##  remotes            2.5.0     2024-03-17 [1] CRAN (R 4.4.0)
##  reshape2           1.4.4     2020-04-09 [1] CRAN (R 4.4.0)
##  rhdf5              2.50.1    2024-12-09 [1] Bioconductor 3.20 (R 4.4.2)
##  rhdf5filters       1.18.0    2024-10-29 [1] Bioconductor 3.20 (R 4.4.1)
##  Rhdf5lib           1.28.0    2024-10-29 [1] Bioconductor 3.20 (R 4.4.1)
##  rlang              1.1.4     2024-06-04 [1] CRAN (R 4.4.0)
##  rmarkdown          2.29      2024-11-04 [1] CRAN (R 4.4.1)
##  rstatix          * 0.7.2     2023-02-01 [1] CRAN (R 4.4.0)
##  rstudioapi         0.17.1    2024-10-22 [1] CRAN (R 4.4.1)
##  S4Vectors          0.44.0    2024-10-29 [1] Bioconductor 3.20 (R 4.4.1)
##  sass               0.4.9     2024-03-15 [1] CRAN (R 4.4.0)
##  scales             1.3.0     2023-11-28 [1] CRAN (R 4.4.0)
##  sessioninfo        1.2.2     2021-12-06 [1] CRAN (R 4.4.0)
##  shiny              1.10.0    2024-12-14 [1] CRAN (R 4.4.1)
##  stringi            1.8.4     2024-05-06 [1] CRAN (R 4.4.0)
##  stringr          * 1.5.1     2023-11-14 [1] CRAN (R 4.4.0)
##  survival           3.7-0     2024-06-05 [2] CRAN (R 4.4.2)
##  tibble           * 3.2.1     2023-03-20 [1] CRAN (R 4.4.0)
##  tidyr            * 1.3.1     2024-01-24 [1] CRAN (R 4.4.0)
##  tidyselect         1.2.1     2024-03-11 [1] CRAN (R 4.4.0)
##  tidyverse        * 2.0.0     2023-02-22 [1] CRAN (R 4.4.0)
##  timechange         0.3.0     2024-01-18 [1] CRAN (R 4.4.0)
##  tzdb               0.4.0     2023-05-12 [1] CRAN (R 4.4.0)
##  urlchecker         1.0.1     2021-11-30 [1] CRAN (R 4.4.0)
##  usethis          * 3.1.0     2024-11-26 [1] CRAN (R 4.4.1)
##  vctrs              0.6.5     2023-12-01 [1] CRAN (R 4.4.0)
##  vegan            * 2.6-8     2024-08-28 [1] CRAN (R 4.4.1)
##  withr              3.0.2     2024-10-28 [1] CRAN (R 4.4.1)
##  xfun               0.50      2025-01-07 [1] CRAN (R 4.4.1)
##  xtable             1.8-4     2019-04-21 [1] CRAN (R 4.4.0)
##  XVector            0.42.0    2023-10-24 [1] Bioconductor
##  yaml               2.3.10    2024-07-26 [1] CRAN (R 4.4.0)
##  zlibbioc           1.48.2    2024-03-13 [1] Bioconductor 3.18 (R 4.4.2)
## 
##  [1] /Users/cab565/Library/R/x86_64/4.4/library
##  [2] /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library
## 
## ──────────────────────────────────────────────────────────────────────────────